Improving Word Alignment Based on Extended Inversion Transduction Grammar

نویسندگان

  • Chung-Chi Huang
  • Wei-Teh Chen
  • Jason S. Chang
چکیده

We propose a fusion of Inversion Transduction Grammar model with IBM-style notation of fertility to improve wordaligning performance. In our approach, binary context-free grammar rules on the source language, accompanied with orientation preferences on the target, and fertilities of words are leveraged to construct a syntax-based statistical translation model. Our model, inherently possessing the characteristic of ITG restrictions and allowing for many consecutive words aligned to one and vise versa, outperforms original ITG model and GIZA++ not only in alignment error rate (23% and 14% error reduction) but in consistent phrase error rate (13% and 9% error reduction) as well. Better performance in these two evaluation metrics will lead to better phrase-based machine translation with great possibility.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Systematic Comparison between Inversion Transduction Grammar and Linear Transduction Grammar for Word Alignment

We present two contributions to grammar driven translation. First, since both Inversion Transduction Grammar and Linear Inversion Transduction Grammars have been shown to produce better alignments then the standard word alignment tool, we investigate how the trade-off between speed and end-to-end translation quality extends to the choice of grammar formalism. Second, we prove that Linear Transd...

متن کامل

Fertility-based Source-Language-biased Inversion Transduction Grammar for Word Alignment

We propose a version of Inversion Transduction Grammar (ITG) model with IBM-style notation of fertility to improve word-alignment performance. In our approach, binary context-free grammar rules of the source language, accompanied by orientation preferences of the target language and fertilities of words, are leveraged to construct a syntax-based statistical translation model. Our model, inheren...

متن کامل

Word Alignment with Stochastic Bracketing Linear Inversion Transduction Grammar

The class of Linear Inversion Transduction Grammars (LITGs) is introduced, and used to induce a word alignment over a parallel corpus. We show that alignment via Stochastic Bracketing LITGs is considerably faster than Stochastic Bracketing ITGs, while still yielding alignments superior to the widelyused heuristic of intersecting bidirectional IBM alignments. Performance is measured as the trans...

متن کامل

Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in bot...

متن کامل

Improving Phrase-Based Translation via Word Alignments from Stochastic Inversion Transduction Grammars

We argue that learning word alignments through a compositionally-structured, joint process yields higher phrase-based translation accuracy than the conventional heuristic of intersecting conditional models. Flawed word alignments can lead to flawed phrase translations that damage translation accuracy. Yet the IBM word alignments usually used today are known to be flawed, in large part because I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007